Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 10000 |
| Missing cells | 8338 |
| Missing cells (%) | 6.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 4 |
product_name has a high cardinality: 9268 distinct values | High cardinality |
brands has a high cardinality: 5586 distinct values | High cardinality |
ingredients_text has a high cardinality: 8858 distinct values | High cardinality |
df_index is highly correlated with Unnamed: 0 | High correlation |
Unnamed: 0 is highly correlated with df_index | High correlation |
product_name has 104 (1.0%) missing values | Missing |
brands has 206 (2.1%) missing values | Missing |
ingredients_text has 885 (8.8%) missing values | Missing |
nutrition_grade_fr has 1416 (14.2%) missing values | Missing |
fat_100g has 560 (5.6%) missing values | Missing |
saturated-fat_100g has 1095 (10.9%) missing values | Missing |
carbohydrates_100g has 567 (5.7%) missing values | Missing |
sugars_100g has 583 (5.8%) missing values | Missing |
fiber_100g has 2551 (25.5%) missing values | Missing |
salt_100g has 228 (2.3%) missing values | Missing |
product_name is uniformly distributed | Uniform |
ingredients_text is uniformly distributed | Uniform |
df_index has unique values | Unique |
Unnamed: 0 has unique values | Unique |
energy_100g has 308 (3.1%) zeros | Zeros |
fat_100g has 2285 (22.9%) zeros | Zeros |
saturated-fat_100g has 2505 (25.1%) zeros | Zeros |
carbohydrates_100g has 733 (7.3%) zeros | Zeros |
sugars_100g has 1342 (13.4%) zeros | Zeros |
fiber_100g has 2487 (24.9%) zeros | Zeros |
proteins_100g has 1895 (18.9%) zeros | Zeros |
salt_100g has 1282 (12.8%) zeros | Zeros |
Reproduction
| Analysis started | 2021-03-22 13:56:06.627681 |
|---|---|
| Analysis finished | 2021-03-22 13:56:25.545161 |
| Duration | 18.92 seconds |
| Software version | pandas-profiling v2.12.0 |
| Download configuration | config.yaml |
| Distinct | 10000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 146760.3517 |
| Minimum | 6 |
|---|---|
| Maximum | 296434 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 14921.65 |
| Q1 | 72413.25 |
| median | 146395.5 |
| Q3 | 219896.5 |
| 95-th percentile | 282103.05 |
| Maximum | 296434 |
| Range | 296428 |
| Interquartile range (IQR) | 147483.25 |
Descriptive statistics
| Standard deviation | 85676.0385 |
|---|---|
| Coefficient of variation (CV) | 0.5837819105 |
| Kurtosis | -1.202221965 |
| Mean | 146760.3517 |
| Median Absolute Deviation (MAD) | 73758.5 |
| Skewness | 0.02868457431 |
| Sum | 1467603517 |
| Variance | 7340383572 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 131072 | 1 | < 0.1% |
| 217831 | 1 | < 0.1% |
| 78555 | 1 | < 0.1% |
| 51932 | 1 | < 0.1% |
| 230109 | 1 | < 0.1% |
| 219870 | 1 | < 0.1% |
| 125202 | 1 | < 0.1% |
| 15074 | 1 | < 0.1% |
| 198386 | 1 | < 0.1% |
| 109288 | 1 | < 0.1% |
| Other values (9990) | 9990 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 20 | 1 | |
| 24 | 1 | |
| 70 | 1 | |
| 137 | 1 |
| Value | Count | Frequency (%) |
| 296434 | 1 | |
| 296391 | 1 | |
| 296383 | 1 | |
| 296360 | 1 | |
| 296343 | 1 |
| Distinct | 10000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 160841.9487 |
| Minimum | 7 |
|---|---|
| Maximum | 355884 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 15438.65 |
| Q1 | 74826.25 |
| median | 151847.5 |
| Q3 | 242296.5 |
| 95-th percentile | 331935.05 |
| Maximum | 355884 |
| Range | 355877 |
| Interquartile range (IQR) | 167470.25 |
Descriptive statistics
| Standard deviation | 100420.8039 |
|---|---|
| Coefficient of variation (CV) | 0.6243446112 |
| Kurtosis | -1.105132484 |
| Mean | 160841.9487 |
| Median Absolute Deviation (MAD) | 82982.5 |
| Skewness | 0.2208738151 |
| Sum | 1608419487 |
| Variance | 1.008433786 × 1010 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 86018 | 1 | < 0.1% |
| 4831 | 1 | < 0.1% |
| 37591 | 1 | < 0.1% |
| 223960 | 1 | < 0.1% |
| 315526 | 1 | < 0.1% |
| 129754 | 1 | < 0.1% |
| 169945 | 1 | < 0.1% |
| 33501 | 1 | < 0.1% |
| 285406 | 1 | < 0.1% |
| 92896 | 1 | < 0.1% |
| Other values (9990) | 9990 |
| Value | Count | Frequency (%) |
| 7 | 1 | |
| 21 | 1 | |
| 25 | 1 | |
| 75 | 1 | |
| 144 | 1 |
| Value | Count | Frequency (%) |
| 355884 | 1 | |
| 355812 | 1 | |
| 355803 | 1 | |
| 355771 | 1 | |
| 355725 | 1 |
| Distinct | 9268 |
|---|---|
| Distinct (%) | 93.7% |
| Missing | 104 |
| Missing (%) | 1.0% |
| Memory size | 78.2 KiB |
| Ice Cream | 16 |
|---|---|
| Extra Virgin Olive Oil | 11 |
| Cookies | 9 |
| Potato Chips | 7 |
| Coconut Water | 7 |
| Other values (9263) |
Length
| Max length | 161 |
|---|---|
| Median length | 24 |
| Mean length | 26.63156831 |
| Min length | 3 |
Characters and Unicode
| Total characters | 263546 |
|---|---|
| Distinct characters | 234 |
| Distinct categories | 16 ? |
| Distinct scripts | 5 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8864 ? |
|---|---|
| Unique (%) | 89.6% |
Sample
| 1st row | Premium Ice Cream |
|---|---|
| 2nd row | Kohler, Strawberry Balsamic Rare Facets Chocolate |
| 3rd row | Center Cut Chops Boneless Thin Pork |
| 4th row | Corn Flakes Glacés au sucre |
| 5th row | Homekist, Sandwich Cookies, Lemon Creme |
| Value | Count | Frequency (%) |
| Ice Cream | 16 | 0.2% |
| Extra Virgin Olive Oil | 11 | 0.1% |
| Cookies | 9 | 0.1% |
| Potato Chips | 7 | 0.1% |
| Coconut Water | 7 | 0.1% |
| Tomato Sauce | 6 | 0.1% |
| Syrup | 6 | 0.1% |
| Juice | 6 | 0.1% |
| Trail Mix | 6 | 0.1% |
| Spaghetti | 6 | 0.1% |
| Other values (9258) | 9816 | |
| (Missing) | 104 | 1.0% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| de | 958 | 2.4% |
| 881 | 2.2% | |
| chocolate | 405 | 1.0% |
| sauce | 351 | 0.9% |
| cheese | 347 | 0.9% |
| organic | 309 | 0.8% |
| au | 302 | 0.7% |
| with | 287 | 0.7% |
| mix | 263 | 0.6% |
| à | 243 | 0.6% |
| Other values (7772) | 36261 |
Most occurring characters
| Value | Count | Frequency (%) |
| 30812 | 11.7% | |
| e | 26038 | 9.9% |
| a | 20024 | 7.6% |
| r | 15420 | 5.9% |
| i | 15153 | 5.7% |
| o | 13556 | 5.1% |
| t | 12219 | 4.6% |
| n | 11827 | 4.5% |
| s | 11350 | 4.3% |
| l | 10768 | 4.1% |
| Other values (224) | 96379 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 191159 | |
| Uppercase Letter | 34235 | 13.0% |
| Space Separator | 30813 | 11.7% |
| Other Punctuation | 4862 | 1.8% |
| Decimal Number | 1555 | 0.6% |
| Dash Punctuation | 498 | 0.2% |
| Open Punctuation | 174 | 0.1% |
| Close Punctuation | 170 | 0.1% |
| Math Symbol | 45 | < 0.1% |
| Other Letter | 11 | < 0.1% |
| Other values (6) | 24 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 26038 | |
| a | 20024 | |
| r | 15420 | 8.1% |
| i | 15153 | 7.9% |
| o | 13556 | 7.1% |
| t | 12219 | 6.4% |
| n | 11827 | 6.2% |
| s | 11350 | 5.9% |
| l | 10768 | 5.6% |
| u | 8436 | 4.4% |
| Other values (98) | 46368 |
| Value | Count | Frequency (%) |
| C | 5311 | |
| S | 4127 | |
| P | 2971 | 8.7% |
| B | 2745 | 8.0% |
| M | 2165 | 6.3% |
| F | 1731 | 5.1% |
| T | 1475 | 4.3% |
| G | 1455 | 4.3% |
| A | 1271 | 3.7% |
| O | 1271 | 3.7% |
| Other values (56) | 9713 |
| Value | Count | Frequency (%) |
| , | 3033 | |
| & | 710 | 14.6% |
| ' | 537 | 11.0% |
| % | 229 | 4.7% |
| . | 147 | 3.0% |
| ; | 79 | 1.6% |
| / | 49 | 1.0% |
| ! | 38 | 0.8% |
| : | 21 | 0.4% |
| * | 5 | 0.1% |
| Other values (7) | 14 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 440 | |
| 1 | 259 | |
| 2 | 232 | |
| 5 | 145 | 9.3% |
| 4 | 130 | 8.4% |
| 3 | 115 | 7.4% |
| 6 | 94 | 6.0% |
| 8 | 72 | 4.6% |
| 7 | 40 | 2.6% |
| 9 | 28 | 1.8% |
| Value | Count | Frequency (%) |
| โ | 2 | |
| ก | 2 | |
| ค | 1 | |
| ซ | 1 | |
| ร | 1 | |
| ล | 1 | |
| น | 1 | |
| ส | 1 | |
| ม | 1 |
| Value | Count | Frequency (%) |
| ( | 169 | |
| [ | 2 | 1.1% |
| { | 2 | 1.1% |
| „ | 1 | 0.6% |
| Value | Count | Frequency (%) |
| ้ | 2 | |
| ่ | 2 | |
| ี | 1 | |
| ิ | 1 |
| Value | Count | Frequency (%) |
| + | 41 | |
| < | 2 | 4.4% |
| > | 2 | 4.4% |
| Value | Count | Frequency (%) |
| ® | 3 | |
| ° | 3 | |
| № | 2 |
| Value | Count | Frequency (%) |
| 30812 | ||
| 1 | < 0.1% |
| Value | Count | Frequency (%) |
| ) | 168 | |
| ] | 2 | 1.2% |
| Value | Count | Frequency (%) |
| « | 2 | |
| “ | 1 |
| Value | Count | Frequency (%) |
| - | 498 |
| Value | Count | Frequency (%) |
| » | 2 |
| Value | Count | Frequency (%) |
| $ | 4 |
| Value | Count | Frequency (%) |
| ´ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 224487 | |
| Common | 38135 | 14.5% |
| Cyrillic | 813 | 0.3% |
| Greek | 94 | < 0.1% |
| Thai | 17 | < 0.1% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 26038 | 11.6% |
| a | 20024 | 8.9% |
| r | 15420 | 6.9% |
| i | 15153 | 6.8% |
| o | 13556 | 6.0% |
| t | 12219 | 5.4% |
| n | 11827 | 5.3% |
| s | 11350 | 5.1% |
| l | 10768 | 4.8% |
| u | 8436 | 3.8% |
| Other values (77) | 79696 |
| Value | Count | Frequency (%) |
| о | 95 | 11.7% |
| а | 65 | 8.0% |
| и | 64 | 7.9% |
| р | 59 | 7.3% |
| н | 55 | 6.8% |
| е | 54 | 6.6% |
| к | 43 | 5.3% |
| с | 42 | 5.2% |
| л | 40 | 4.9% |
| т | 27 | 3.3% |
| Other values (38) | 269 |
| Value | Count | Frequency (%) |
| 30812 | ||
| , | 3033 | 8.0% |
| & | 710 | 1.9% |
| ' | 537 | 1.4% |
| - | 498 | 1.3% |
| 0 | 440 | 1.2% |
| 1 | 259 | 0.7% |
| 2 | 232 | 0.6% |
| % | 229 | 0.6% |
| ( | 169 | 0.4% |
| Other values (37) | 1216 | 3.2% |
| Value | Count | Frequency (%) |
| ο | 7 | 7.4% |
| α | 5 | 5.3% |
| ς | 5 | 5.3% |
| μ | 5 | 5.3% |
| Ο | 5 | 5.3% |
| ρ | 4 | 4.3% |
| ν | 4 | 4.3% |
| Κ | 3 | 3.2% |
| ε | 3 | 3.2% |
| λ | 3 | 3.2% |
| Other values (29) | 50 |
| Value | Count | Frequency (%) |
| โ | 2 | |
| ้ | 2 | |
| ก | 2 | |
| ่ | 2 | |
| ค | 1 | 5.9% |
| ซ | 1 | 5.9% |
| ี | 1 | 5.9% |
| ร | 1 | 5.9% |
| ล | 1 | 5.9% |
| ิ | 1 | 5.9% |
| Other values (3) | 3 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 260225 | |
| None | 2485 | 0.9% |
| Cyrillic | 813 | 0.3% |
| Thai | 17 | < 0.1% |
| Punctuation | 4 | < 0.1% |
| Letterlike Symbols | 2 | < 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 30812 | 11.8% | |
| e | 26038 | 10.0% |
| a | 20024 | 7.7% |
| r | 15420 | 5.9% |
| i | 15153 | 5.8% |
| o | 13556 | 5.2% |
| t | 12219 | 4.7% |
| n | 11827 | 4.5% |
| s | 11350 | 4.4% |
| l | 10768 | 4.1% |
| Other values (78) | 93058 |
| Value | Count | Frequency (%) |
| é | 1351 | |
| à | 241 | 9.7% |
| è | 201 | 8.1% |
| â | 109 | 4.4% |
| ê | 67 | 2.7% |
| ü | 48 | 1.9% |
| ä | 46 | 1.9% |
| û | 43 | 1.7% |
| ô | 35 | 1.4% |
| É | 35 | 1.4% |
| Other values (70) | 309 | 12.4% |
| Value | Count | Frequency (%) |
| о | 95 | 11.7% |
| а | 65 | 8.0% |
| и | 64 | 7.9% |
| р | 59 | 7.3% |
| н | 55 | 6.8% |
| е | 54 | 6.6% |
| к | 43 | 5.3% |
| с | 42 | 5.2% |
| л | 40 | 4.9% |
| т | 27 | 3.3% |
| Other values (38) | 269 |
| Value | Count | Frequency (%) |
| № | 2 |
| Value | Count | Frequency (%) |
| „ | 1 | |
| “ | 1 | |
| • | 1 | |
| … | 1 |
| Value | Count | Frequency (%) |
| โ | 2 | |
| ้ | 2 | |
| ก | 2 | |
| ่ | 2 | |
| ค | 1 | 5.9% |
| ซ | 1 | 5.9% |
| ี | 1 | 5.9% |
| ร | 1 | 5.9% |
| ล | 1 | 5.9% |
| ิ | 1 | 5.9% |
| Other values (3) | 3 |
| Distinct | 5586 |
|---|---|
| Distinct (%) | 57.0% |
| Missing | 206 |
| Missing (%) | 2.1% |
| Memory size | 78.2 KiB |
| Auchan | 107 |
|---|---|
| Carrefour | 104 |
| U | 81 |
| Leader Price | 72 |
| Casino | 65 |
| Other values (5581) |
Length
| Max length | 106 |
|---|---|
| Median length | 12 |
| Mean length | 15.12824178 |
| Min length | 1 |
Characters and Unicode
| Total characters | 148166 |
|---|---|
| Distinct characters | 182 |
| Distinct categories | 16 ? |
| Distinct scripts | 5 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 4359 ? |
|---|---|
| Unique (%) | 44.5% |
Sample
| 1st row | Casper's Ice Cream Inc. |
|---|---|
| 2nd row | Kohler Original Recipe Chocolates |
| 3rd row | Target Stores |
| 4th row | Crownfield |
| 5th row | Vista Bakery Inc. |
| Value | Count | Frequency (%) |
| Auchan | 107 | 1.1% |
| Carrefour | 104 | 1.0% |
| U | 81 | 0.8% |
| Leader Price | 72 | 0.7% |
| Casino | 65 | 0.7% |
| Meijer | 64 | 0.6% |
| Kroger | 54 | 0.5% |
| Spartan | 44 | 0.4% |
| Roundy's | 41 | 0.4% |
| Great Value | 39 | 0.4% |
| Other values (5576) | 9123 | |
| (Missing) | 206 | 2.1% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| inc | 1259 | 5.6% |
| foods | 494 | 2.2% |
| 312 | 1.4% | |
| company | 311 | 1.4% |
| llc | 283 | 1.3% |
| food | 256 | 1.1% |
| co | 253 | 1.1% |
| the | 178 | 0.8% |
| market | 143 | 0.6% |
| carrefour | 141 | 0.6% |
| Other values (5850) | 18693 |
Most occurring characters
| Value | Count | Frequency (%) |
| 14709 | 9.9% | |
| e | 12563 | 8.5% |
| a | 10687 | 7.2% |
| r | 9788 | 6.6% |
| o | 9466 | 6.4% |
| n | 8239 | 5.6% |
| i | 7613 | 5.1% |
| s | 6470 | 4.4% |
| t | 5645 | 3.8% |
| l | 5486 | 3.7% |
| Other values (172) | 57500 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 104204 | |
| Uppercase Letter | 23051 | 15.6% |
| Space Separator | 14709 | 9.9% |
| Other Punctuation | 5490 | 3.7% |
| Dash Punctuation | 354 | 0.2% |
| Decimal Number | 252 | 0.2% |
| Open Punctuation | 39 | < 0.1% |
| Close Punctuation | 39 | < 0.1% |
| Math Symbol | 8 | < 0.1% |
| Other Letter | 8 | < 0.1% |
| Other values (6) | 12 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 12563 | |
| a | 10687 | |
| r | 9788 | |
| o | 9466 | 9.1% |
| n | 8239 | 7.9% |
| i | 7613 | 7.3% |
| s | 6470 | 6.2% |
| t | 5645 | 5.4% |
| l | 5486 | 5.3% |
| c | 4499 | 4.3% |
| Other values (77) | 23748 |
| Value | Count | Frequency (%) |
| C | 2415 | 10.5% |
| S | 1976 | 8.6% |
| F | 1751 | 7.6% |
| I | 1655 | 7.2% |
| M | 1631 | 7.1% |
| B | 1501 | 6.5% |
| L | 1410 | 6.1% |
| P | 1187 | 5.1% |
| A | 1026 | 4.5% |
| T | 969 | 4.2% |
| Other values (45) | 7530 |
| Value | Count | Frequency (%) |
| . | 1909 | |
| , | 1865 | |
| ' | 970 | |
| & | 326 | 5.9% |
| / | 305 | 5.6% |
| : | 73 | 1.3% |
| ! | 34 | 0.6% |
| " | 4 | 0.1% |
| % | 3 | 0.1% |
| @ | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 3 | 46 | |
| 5 | 44 | |
| 6 | 39 | |
| 1 | 30 | |
| 0 | 25 | |
| 2 | 20 | |
| 7 | 18 | 7.1% |
| 4 | 13 | 5.2% |
| 8 | 10 | 4.0% |
| 9 | 7 | 2.8% |
| Value | Count | Frequency (%) |
| โ | 1 | |
| ค | 1 | |
| ก | 1 | |
| แ | 1 | |
| ฟ | 1 | |
| น | 1 | |
| ต | 1 | |
| า | 1 |
| Value | Count | Frequency (%) |
| - | 353 | |
| — | 1 | 0.3% |
| Value | Count | Frequency (%) |
| 14709 |
| Value | Count | Frequency (%) |
| № | 1 |
| Value | Count | Frequency (%) |
| ( | 39 |
| Value | Count | Frequency (%) |
| ) | 39 |
| Value | Count | Frequency (%) |
| + | 8 |
| Value | Count | Frequency (%) |
| $ | 4 |
| Value | Count | Frequency (%) |
| ้ | 2 |
| Value | Count | Frequency (%) |
| ` | 1 |
| Value | Count | Frequency (%) |
| « | 2 |
| Value | Count | Frequency (%) |
| » | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 126920 | |
| Common | 20901 | 14.1% |
| Cyrillic | 324 | 0.2% |
| Greek | 11 | < 0.1% |
| Thai | 10 | < 0.1% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 12563 | 9.9% |
| a | 10687 | 8.4% |
| r | 9788 | 7.7% |
| o | 9466 | 7.5% |
| n | 8239 | 6.5% |
| i | 7613 | 6.0% |
| s | 6470 | 5.1% |
| t | 5645 | 4.4% |
| l | 5486 | 4.3% |
| c | 4499 | 3.5% |
| Other values (71) | 46464 |
| Value | Count | Frequency (%) |
| а | 27 | 8.3% |
| о | 27 | 8.3% |
| е | 22 | 6.8% |
| р | 21 | 6.5% |
| н | 19 | 5.9% |
| и | 18 | 5.6% |
| с | 16 | 4.9% |
| к | 14 | 4.3% |
| л | 13 | 4.0% |
| т | 12 | 3.7% |
| Other values (40) | 135 |
| Value | Count | Frequency (%) |
| 14709 | ||
| . | 1909 | 9.1% |
| , | 1865 | 8.9% |
| ' | 970 | 4.6% |
| - | 353 | 1.7% |
| & | 326 | 1.6% |
| / | 305 | 1.5% |
| : | 73 | 0.3% |
| 3 | 46 | 0.2% |
| 5 | 44 | 0.2% |
| Other values (21) | 301 | 1.4% |
| Value | Count | Frequency (%) |
| Δ | 1 | |
| ω | 1 | |
| δ | 1 | |
| ώ | 1 | |
| ν | 1 | |
| η | 1 | |
| Ε | 1 | |
| λ | 1 | |
| α | 1 | |
| ϊ | 1 |
| Value | Count | Frequency (%) |
| ้ | 2 | |
| โ | 1 | |
| ค | 1 | |
| ก | 1 | |
| แ | 1 | |
| ฟ | 1 | |
| น | 1 | |
| ต | 1 | |
| า | 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 147196 | |
| None | 634 | 0.4% |
| Cyrillic | 324 | 0.2% |
| Thai | 10 | < 0.1% |
| Letterlike Symbols | 1 | < 0.1% |
| Punctuation | 1 | < 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 14709 | 10.0% | |
| e | 12563 | 8.5% |
| a | 10687 | 7.3% |
| r | 9788 | 6.6% |
| o | 9466 | 6.4% |
| n | 8239 | 5.6% |
| i | 7613 | 5.2% |
| s | 6470 | 4.4% |
| t | 5645 | 3.8% |
| l | 5486 | 3.7% |
| Other values (69) | 56530 |
| Value | Count | Frequency (%) |
| é | 349 | |
| è | 122 | 19.2% |
| ô | 22 | 3.5% |
| ü | 21 | 3.3% |
| ó | 18 | 2.8% |
| ê | 12 | 1.9% |
| É | 9 | 1.4% |
| î | 8 | 1.3% |
| ä | 8 | 1.3% |
| í | 7 | 1.1% |
| Other values (32) | 58 | 9.1% |
| Value | Count | Frequency (%) |
| а | 27 | 8.3% |
| о | 27 | 8.3% |
| е | 22 | 6.8% |
| р | 21 | 6.5% |
| н | 19 | 5.9% |
| и | 18 | 5.6% |
| с | 16 | 4.9% |
| к | 14 | 4.3% |
| л | 13 | 4.0% |
| т | 12 | 3.7% |
| Other values (40) | 135 |
| Value | Count | Frequency (%) |
| № | 1 |
| Value | Count | Frequency (%) |
| ้ | 2 | |
| โ | 1 | |
| ค | 1 | |
| ก | 1 | |
| แ | 1 | |
| ฟ | 1 | |
| น | 1 | |
| ต | 1 | |
| า | 1 |
| Value | Count | Frequency (%) |
| — | 1 |
| Distinct | 8858 |
|---|---|
| Distinct (%) | 97.2% |
| Missing | 885 |
| Missing (%) | 8.8% |
| Memory size | 78.2 KiB |
| Extra virgin olive oil | 8 |
|---|---|
| Semoule de _blé_ dur de qualité supérieure. | 7 |
| Semolina (wheat), durum flour (wheat), niacin, ferrous sulfate (iron), thiamin mononitrate, riboflavin, folic acid. | 7 |
| Extra virgin olive oil. | 7 |
| Almonds. | 7 |
| Other values (8853) |
Length
| Max length | 4468 |
|---|---|
| Median length | 183 |
| Mean length | 215.7068568 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1966168 |
|---|---|
| Distinct characters | 275 |
| Distinct categories | 20 ? |
| Distinct scripts | 5 ? |
| Distinct blocks | 7 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8696 ? |
|---|---|
| Unique (%) | 95.4% |
Sample
| 1st row | Fresh whole milk, sugar, cream, mint flake (corn syrup, sugar, partially hydrogenated soybean oil, red lake 40, green 3, red 40, peppermint oil, soy lecithin), corn syrup solids, non-fat dry milk, stabilizer (microcrystalline cellulose, cellulose gum, mon |
|---|---|
| 2nd row | 61% chocolate (cocoa beans, pure cane sugar, cocoa butter, soy lecithin, vanilla bean), milk chocolate (pure cane sugar, full cream milk, cocoa butter, cocoa beans, soy lecithin, vanilla bean), strawberry puree (strawberry 87%, invert sugar syrup), 55% ch |
| 3rd row | Pork, *solution ingredients: water, potassium lactate, sodium phosphates, salt, diacetate. |
| 4th row | Maïs 77% Sucre 28% Extrait de Malte d'orge Sel |
| 5th row | Enriched flour (wheat flour, niacin, reduced iron, thiamine mononitrate, riboflavin, folic acid), sugar, vegetable oil (contains one or more of the following: canola oil, corn oil, palm oil, soybean oil), dextrose, high fructose corn syrup, corn syrup, co |
| Value | Count | Frequency (%) |
| Extra virgin olive oil | 8 | 0.1% |
| Semoule de _blé_ dur de qualité supérieure. | 7 | 0.1% |
| Semolina (wheat), durum flour (wheat), niacin, ferrous sulfate (iron), thiamin mononitrate, riboflavin, folic acid. | 7 | 0.1% |
| Extra virgin olive oil. | 7 | 0.1% |
| Almonds. | 7 | 0.1% |
| Pecans. | 7 | 0.1% |
| Durum semolina, niacin, ferrous sulfate (iron), thiamine mononitrate, riboflavin, folic acid. | 6 | 0.1% |
| Soybean oil. | 6 | 0.1% |
| Pasteurized part-skim milk, cheese culture, salt, enzymes. | 5 | 0.1% |
| Semolina, niacin, ferrous sulfate (iron), thiamin mononitrate, riboflavin, folic acid. | 5 | 0.1% |
| Other values (8848) | 9050 | |
| (Missing) | 885 | 8.8% |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| de | 13304 | 4.8% |
| 7962 | 2.9% | |
| salt | 4571 | 1.6% |
| sugar | 3690 | 1.3% |
| oil | 3313 | 1.2% |
| acid | 3231 | 1.2% |
| water | 3135 | 1.1% |
| flour | 2801 | 1.0% |
| and | 2612 | 0.9% |
| organic | 2544 | 0.9% |
| Other values (16648) | 230087 |
Most occurring characters
| Value | Count | Frequency (%) |
| 268919 | 13.7% | |
| e | 168547 | 8.6% |
| a | 141767 | 7.2% |
| r | 116227 | 5.9% |
| i | 112668 | 5.7% |
| o | 105970 | 5.4% |
| t | 98060 | 5.0% |
| s | 90269 | 4.6% |
| , | 89685 | 4.6% |
| n | 87755 | 4.5% |
| Other values (265) | 686301 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1471189 | |
| Space Separator | 268935 | 13.7% |
| Other Punctuation | 121601 | 6.2% |
| Uppercase Letter | 37029 | 1.9% |
| Decimal Number | 25346 | 1.3% |
| Open Punctuation | 15680 | 0.8% |
| Close Punctuation | 14777 | 0.8% |
| Connector Punctuation | 7774 | 0.4% |
| Dash Punctuation | 3361 | 0.2% |
| Math Symbol | 198 | < 0.1% |
| Other values (10) | 278 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 168547 | |
| a | 141767 | 9.6% |
| r | 116227 | 7.9% |
| i | 112668 | 7.7% |
| o | 105970 | 7.2% |
| t | 98060 | 6.7% |
| s | 90269 | 6.1% |
| n | 87755 | 6.0% |
| l | 83430 | 5.7% |
| c | 77494 | 5.3% |
| Other values (119) | 389002 |
| Value | Count | Frequency (%) |
| E | 4810 | 13.0% |
| S | 3412 | 9.2% |
| C | 2786 | 7.5% |
| A | 2397 | 6.5% |
| P | 2267 | 6.1% |
| I | 2103 | 5.7% |
| O | 1862 | 5.0% |
| T | 1706 | 4.6% |
| R | 1629 | 4.4% |
| F | 1446 | 3.9% |
| Other values (40) | 12611 |
| Value | Count | Frequency (%) |
| น | 5 | 8.5% |
| า | 5 | 8.5% |
| ร | 5 | 8.5% |
| ต | 4 | 6.8% |
| ส | 4 | 6.8% |
| ค | 4 | 6.8% |
| เ | 4 | 6.8% |
| ล | 3 | 5.1% |
| ว | 3 | 5.1% |
| ม | 3 | 5.1% |
| Other values (14) | 19 |
| Value | Count | Frequency (%) |
| , | 89685 | |
| . | 9858 | 8.1% |
| % | 6701 | 5.5% |
| : | 5935 | 4.9% |
| * | 2696 | 2.2% |
| ' | 2568 | 2.1% |
| / | 1516 | 1.2% |
| ; | 980 | 0.8% |
| & | 718 | 0.6% |
| # | 374 | 0.3% |
| Other values (10) | 570 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 4394 | |
| 1 | 4317 | |
| 2 | 3701 | |
| 5 | 2837 | |
| 3 | 2439 | |
| 4 | 2423 | |
| 6 | 1888 | |
| 7 | 1280 | 5.1% |
| 8 | 1141 | 4.5% |
| 9 | 926 | 3.7% |
| Value | Count | Frequency (%) |
| ี | 3 | |
| ั | 3 | |
| ้ | 2 | |
| ุ | 2 | |
| ่ | 2 | |
| ิ | 2 | |
| ็ | 1 | 6.2% |
| ์ | 1 | 6.2% |
| Value | Count | Frequency (%) |
| + | 152 | |
| = | 33 | 16.7% |
| | | 5 | 2.5% |
| ± | 5 | 2.5% |
| < | 3 | 1.5% |
| Value | Count | Frequency (%) |
| ( | 14115 | |
| [ | 1407 | 9.0% |
| { | 151 | 1.0% |
| „ | 7 | < 0.1% |
| Value | Count | Frequency (%) |
| ) | 13389 | |
| ] | 1249 | 8.5% |
| } | 139 | 0.9% |
| Value | Count | Frequency (%) |
| ’ | 104 | |
| » | 4 | 3.7% |
| ” | 1 | 0.9% |
| Value | Count | Frequency (%) |
| “ | 17 | |
| ‘ | 16 | |
| « | 8 |
| Value | Count | Frequency (%) |
| $ | 21 | |
| € | 15 | |
| ¤ | 1 | 2.7% |
| Value | Count | Frequency (%) |
| 268919 | ||
| 16 | < 0.1% |
| Value | Count | Frequency (%) |
| - | 3328 | |
| — | 33 | 1.0% |
| Value | Count | Frequency (%) |
| ¹ | 2 | |
| ₁ | 1 |
| Value | Count | Frequency (%) |
| | 2 | |
| | 1 |
| Value | Count | Frequency (%) |
| ° | 4 | |
| ® | 4 |
| Value | Count | Frequency (%) |
| _ | 7774 |
| Value | Count | Frequency (%) |
| ` | 1 |
| Value | Count | Frequency (%) |
| | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1505009 | |
| Common | 457875 | 23.3% |
| Cyrillic | 2961 | 0.2% |
| Greek | 248 | < 0.1% |
| Thai | 75 | < 0.1% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 168547 | 11.2% |
| a | 141767 | 9.4% |
| r | 116227 | 7.7% |
| i | 112668 | 7.5% |
| o | 105970 | 7.0% |
| t | 98060 | 6.5% |
| s | 90269 | 6.0% |
| n | 87755 | 5.8% |
| l | 83430 | 5.5% |
| c | 77494 | 5.1% |
| Other values (103) | 422822 |
| Value | Count | Frequency (%) |
| 268919 | ||
| , | 89685 | 19.6% |
| ( | 14115 | 3.1% |
| ) | 13389 | 2.9% |
| . | 9858 | 2.2% |
| _ | 7774 | 1.7% |
| % | 6701 | 1.5% |
| : | 5935 | 1.3% |
| 0 | 4394 | 1.0% |
| 1 | 4317 | 0.9% |
| Other values (54) | 32788 | 7.2% |
| Value | Count | Frequency (%) |
| о | 351 | 11.9% |
| а | 312 | 10.5% |
| н | 215 | 7.3% |
| е | 201 | 6.8% |
| и | 188 | 6.3% |
| л | 178 | 6.0% |
| р | 170 | 5.7% |
| т | 148 | 5.0% |
| с | 145 | 4.9% |
| к | 137 | 4.6% |
| Other values (24) | 916 |
| Value | Count | Frequency (%) |
| α | 24 | 9.7% |
| τ | 23 | 9.3% |
| ο | 21 | 8.5% |
| ι | 19 | 7.7% |
| κ | 17 | 6.9% |
| ρ | 13 | 5.2% |
| ά | 12 | 4.8% |
| λ | 12 | 4.8% |
| υ | 11 | 4.4% |
| σ | 9 | 3.6% |
| Other values (22) | 87 |
| Value | Count | Frequency (%) |
| น | 5 | 6.7% |
| า | 5 | 6.7% |
| ร | 5 | 6.7% |
| ต | 4 | 5.3% |
| ส | 4 | 5.3% |
| ค | 4 | 5.3% |
| เ | 4 | 5.3% |
| ล | 3 | 4.0% |
| ว | 3 | 4.0% |
| ม | 3 | 4.0% |
| Other values (22) | 35 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1941759 | |
| None | 20955 | 1.1% |
| Cyrillic | 2961 | 0.2% |
| Punctuation | 368 | < 0.1% |
| Thai | 75 | < 0.1% |
| Alphabetic PF | 35 | < 0.1% |
| Currency Symbols | 15 | < 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 268919 | ||
| e | 168547 | 8.7% |
| a | 141767 | 7.3% |
| r | 116227 | 6.0% |
| i | 112668 | 5.8% |
| o | 105970 | 5.5% |
| t | 98060 | 5.1% |
| s | 90269 | 4.6% |
| , | 89685 | 4.6% |
| n | 87755 | 4.5% |
| Other values (82) | 661892 |
| Value | Count | Frequency (%) |
| é | 13489 | |
| ô | 1573 | 7.5% |
| è | 1037 | 4.9% |
| à | 932 | 4.4% |
| â | 599 | 2.9% |
| ï | 472 | 2.3% |
| œ | 334 | 1.6% |
| ü | 282 | 1.3% |
| É | 273 | 1.3% |
| ä | 219 | 1.0% |
| Other values (94) | 1745 | 8.3% |
| Value | Count | Frequency (%) |
| • | 174 | |
| ’ | 104 | |
| — | 33 | 9.0% |
| “ | 17 | 4.6% |
| ‘ | 16 | 4.3% |
| † | 14 | 3.8% |
| „ | 7 | 1.9% |
| ” | 1 | 0.3% |
| | 1 | 0.3% |
| ‡ | 1 | 0.3% |
| Value | Count | Frequency (%) |
| fi | 27 | |
| fl | 8 | 22.9% |
| Value | Count | Frequency (%) |
| о | 351 | 11.9% |
| а | 312 | 10.5% |
| н | 215 | 7.3% |
| е | 201 | 6.8% |
| и | 188 | 6.3% |
| л | 178 | 6.0% |
| р | 170 | 5.7% |
| т | 148 | 5.0% |
| с | 145 | 4.9% |
| к | 137 | 4.6% |
| Other values (24) | 916 |
| Value | Count | Frequency (%) |
| € | 15 |
| Value | Count | Frequency (%) |
| น | 5 | 6.7% |
| า | 5 | 6.7% |
| ร | 5 | 6.7% |
| ต | 4 | 5.3% |
| ส | 4 | 5.3% |
| ค | 4 | 5.3% |
| เ | 4 | 5.3% |
| ล | 3 | 4.0% |
| ว | 3 | 4.0% |
| ม | 3 | 4.0% |
| Other values (22) | 35 |
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1416 |
| Missing (%) | 14.2% |
| Memory size | 78.2 KiB |
| d | |
|---|---|
| c | |
| e | |
| a | |
| b |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8584 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | d |
|---|---|
| 2nd row | c |
| 3rd row | c |
| 4th row | d |
| 5th row | e |
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 | |
| (Missing) | 1416 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 |
Most occurring characters
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 8584 |
Most frequent character per category
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 8584 |
Most frequent character per script
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8584 |
Most frequent character per block
| Value | Count | Frequency (%) |
| d | 2455 | |
| c | 1779 | |
| e | 1676 | |
| a | 1393 | |
| b | 1281 |
| Distinct | 1780 |
|---|---|
| Distinct (%) | 17.9% |
| Missing | 52 |
| Missing (%) | 0.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1128.245091 |
| Minimum | 0 |
|---|---|
| Maximum | 3768 |
| Zeros | 308 |
| Zeros (%) | 3.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 71 |
| Q1 | 389 |
| median | 1105 |
| Q3 | 1674 |
| 95-th percentile | 2389 |
| Maximum | 3768 |
| Range | 3768 |
| Interquartile range (IQR) | 1285 |
Descriptive statistics
| Standard deviation | 792.5466442 |
|---|---|
| Coefficient of variation (CV) | 0.7024596431 |
| Kurtosis | -0.403757058 |
| Mean | 1128.245091 |
| Median Absolute Deviation (MAD) | 657 |
| Skewness | 0.4355420009 |
| Sum | 11223782.17 |
| Variance | 628130.1833 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 308 | 3.1% |
| 2092 | 163 | 1.6% |
| 1674 | 159 | 1.6% |
| 1494 | 130 | 1.3% |
| 1393 | 120 | 1.2% |
| 1046 | 113 | 1.1% |
| 1644 | 111 | 1.1% |
| 1569 | 90 | 0.9% |
| 1197 | 89 | 0.9% |
| 837 | 78 | 0.8% |
| Other values (1770) | 8587 |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 0.42 | 1 | < 0.1% |
| 1 | 3 | < 0.1% |
| 2 | 1 | < 0.1% |
| 3 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 3768 | 1 | < 0.1% |
| 3766 | 15 | |
| 3761 | 3 | < 0.1% |
| 3749 | 1 | < 0.1% |
| 3707 | 1 | < 0.1% |
| Distinct | 1291 |
|---|---|
| Distinct (%) | 13.7% |
| Missing | 560 |
| Missing (%) | 5.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.83285781 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 2285 |
| Zeros (%) | 22.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.1 |
| median | 5.36 |
| Q3 | 20 |
| 95-th percentile | 46.43 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 19.9 |
Descriptive statistics
| Standard deviation | 17.41943943 |
|---|---|
| Coefficient of variation (CV) | 1.357409214 |
| Kurtosis | 6.229673009 |
| Mean | 12.83285781 |
| Median Absolute Deviation (MAD) | 5.36 |
| Skewness | 2.197307383 |
| Sum | 121142.1777 |
| Variance | 303.4368701 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2285 | 22.9% |
| 0.5 | 171 | 1.7% |
| 0.1 | 132 | 1.3% |
| 25 | 126 | 1.3% |
| 20 | 105 | 1.1% |
| 32.14 | 97 | 1.0% |
| 28.57 | 95 | 0.9% |
| 10 | 84 | 0.8% |
| 30 | 84 | 0.8% |
| 1.79 | 77 | 0.8% |
| Other values (1281) | 6184 | |
| (Missing) | 560 | 5.6% |
| Value | Count | Frequency (%) |
| 0 | 2285 | |
| 0.007 | 1 | < 0.1% |
| 0.01 | 4 | < 0.1% |
| 0.02 | 2 | < 0.1% |
| 0.0237 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 52 | |
| 99.7 | 1 | < 0.1% |
| 99.4 | 1 | < 0.1% |
| 99 | 1 | < 0.1% |
| 93.33 | 39 |
| Distinct | 872 |
|---|---|
| Distinct (%) | 9.8% |
| Missing | 1095 |
| Missing (%) | 10.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.957228433 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 2505 |
| Zeros (%) | 25.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.79 |
| Q3 | 7.14 |
| 95-th percentile | 19.34 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 7.14 |
Descriptive statistics
| Standard deviation | 7.342593803 |
|---|---|
| Coefficient of variation (CV) | 1.481189318 |
| Kurtosis | 17.41727397 |
| Mean | 4.957228433 |
| Median Absolute Deviation (MAD) | 1.79 |
| Skewness | 2.994774817 |
| Sum | 44144.1192 |
| Variance | 53.91368376 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2505 | |
| 0.1 | 288 | 2.9% |
| 0.5 | 150 | 1.5% |
| 3.57 | 134 | 1.3% |
| 0.2 | 117 | 1.2% |
| 1 | 115 | 1.1% |
| 0.3 | 106 | 1.1% |
| 7.14 | 99 | 1.0% |
| 0.4 | 93 | 0.9% |
| 10 | 77 | 0.8% |
| Other values (862) | 5221 | |
| (Missing) | 1095 | 10.9% |
| Value | Count | Frequency (%) |
| 0 | 2505 | |
| 0.0001 | 1 | < 0.1% |
| 0.001 | 3 | < 0.1% |
| 0.003 | 1 | < 0.1% |
| 0.0052 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 1 | |
| 92.1 | 1 | |
| 86.67 | 2 | |
| 79 | 1 | |
| 64.29 | 1 |
| Distinct | 1928 |
|---|---|
| Distinct (%) | 20.4% |
| Missing | 567 |
| Missing (%) | 5.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 32.03619103 |
| Minimum | 0 |
|---|---|
| Maximum | 164 |
| Zeros | 733 |
| Zeros (%) | 7.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 5.7 |
| median | 20.85 |
| Q3 | 58.82 |
| 95-th percentile | 81.34 |
| Maximum | 164 |
| Range | 164 |
| Interquartile range (IQR) | 53.12 |
Descriptive statistics
| Standard deviation | 29.11406654 |
|---|---|
| Coefficient of variation (CV) | 0.9087867691 |
| Kurtosis | -1.06454764 |
| Mean | 32.03619103 |
| Median Absolute Deviation (MAD) | 19.85 |
| Skewness | 0.5516410967 |
| Sum | 302197.39 |
| Variance | 847.6288706 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 733 | 7.3% |
| 3.57 | 150 | 1.5% |
| 6.67 | 110 | 1.1% |
| 0.5 | 108 | 1.1% |
| 50 | 107 | 1.1% |
| 75 | 86 | 0.9% |
| 1 | 85 | 0.9% |
| 100 | 84 | 0.8% |
| 60 | 81 | 0.8% |
| 80 | 78 | 0.8% |
| Other values (1918) | 7811 | |
| (Missing) | 567 | 5.7% |
| Value | Count | Frequency (%) |
| 0 | 733 | |
| 0.02 | 1 | < 0.1% |
| 0.04 | 1 | < 0.1% |
| 0.1 | 24 | 0.2% |
| 0.14 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 164 | 1 | < 0.1% |
| 100 | 84 | |
| 99.5 | 1 | < 0.1% |
| 99.2 | 1 | < 0.1% |
| 99 | 2 | < 0.1% |
| Distinct | 1488 |
|---|---|
| Distinct (%) | 15.8% |
| Missing | 583 |
| Missing (%) | 5.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.55183244 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 1342 |
| Zeros (%) | 13.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1.2 |
| median | 5.29 |
| Q3 | 22.88 |
| 95-th percentile | 61.244 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 21.68 |
Descriptive statistics
| Standard deviation | 20.9520734 |
|---|---|
| Coefficient of variation (CV) | 1.347241457 |
| Kurtosis | 2.52318645 |
| Mean | 15.55183244 |
| Median Absolute Deviation (MAD) | 5.29 |
| Skewness | 1.736750624 |
| Sum | 146451.6061 |
| Variance | 438.9893797 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1342 | 13.4% |
| 3.57 | 261 | 2.6% |
| 0.5 | 228 | 2.3% |
| 3.33 | 143 | 1.4% |
| 1 | 123 | 1.2% |
| 20 | 80 | 0.8% |
| 0.7 | 79 | 0.8% |
| 10 | 78 | 0.8% |
| 0.8 | 77 | 0.8% |
| 2 | 75 | 0.8% |
| Other values (1478) | 6931 | |
| (Missing) | 583 | 5.8% |
| Value | Count | Frequency (%) |
| 0 | 1342 | |
| 0.0001 | 1 | < 0.1% |
| 0.001 | 2 | < 0.1% |
| 0.01 | 3 | < 0.1% |
| 0.02 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 32 | |
| 99.5 | 2 | < 0.1% |
| 99 | 2 | < 0.1% |
| 98.82 | 1 | < 0.1% |
| 97.6 | 1 | < 0.1% |
| Distinct | 275 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 2551 |
| Missing (%) | 25.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.90515049 |
| Minimum | -6.7 |
|---|---|
| Maximum | 99 |
| Zeros | 2487 |
| Zeros (%) | 24.9% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | -6.7 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.6 |
| Q3 | 3.6 |
| 95-th percentile | 10 |
| Maximum | 99 |
| Range | 105.7 |
| Interquartile range (IQR) | 3.6 |
Descriptive statistics
| Standard deviation | 4.805121779 |
|---|---|
| Coefficient of variation (CV) | 1.654000987 |
| Kurtosis | 67.29296888 |
| Mean | 2.90515049 |
| Median Absolute Deviation (MAD) | 1.6 |
| Skewness | 5.879151637 |
| Sum | 21640.466 |
| Variance | 23.08919531 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2487 | |
| 3.6 | 322 | 3.2% |
| 1.8 | 153 | 1.5% |
| 0.8 | 151 | 1.5% |
| 2 | 147 | 1.5% |
| 3.3 | 140 | 1.4% |
| 7.1 | 139 | 1.4% |
| 0.5 | 134 | 1.3% |
| 6.7 | 127 | 1.3% |
| 2.4 | 125 | 1.2% |
| Other values (265) | 3524 | |
| (Missing) | 2551 |
| Value | Count | Frequency (%) |
| -6.7 | 1 | < 0.1% |
| 0 | 2487 | |
| 0.01 | 1 | < 0.1% |
| 0.07 | 1 | < 0.1% |
| 0.1 | 40 | 0.4% |
| Value | Count | Frequency (%) |
| 99 | 1 | |
| 86.2 | 1 | |
| 77.8 | 1 | |
| 72.5 | 1 | |
| 66.7 | 1 |
| Distinct | 933 |
|---|---|
| Distinct (%) | 9.4% |
| Missing | 91 |
| Missing (%) | 0.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.13126703 |
| Minimum | 0 |
|---|---|
| Maximum | 86.36 |
| Zeros | 1895 |
| Zeros (%) | 18.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.8 |
| median | 5 |
| Q3 | 10 |
| 95-th percentile | 23.4 |
| Maximum | 86.36 |
| Range | 86.36 |
| Interquartile range (IQR) | 9.2 |
Descriptive statistics
| Standard deviation | 7.985387317 |
|---|---|
| Coefficient of variation (CV) | 1.119771183 |
| Kurtosis | 6.949528802 |
| Mean | 7.13126703 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | 1.96964073 |
| Sum | 70663.725 |
| Variance | 63.7664106 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1895 | 18.9% |
| 7.14 | 195 | 1.9% |
| 0.5 | 164 | 1.6% |
| 10 | 152 | 1.5% |
| 6.67 | 132 | 1.3% |
| 3.57 | 130 | 1.3% |
| 25 | 128 | 1.3% |
| 3.33 | 128 | 1.3% |
| 5 | 126 | 1.3% |
| 12.5 | 96 | 1.0% |
| Other values (923) | 6763 |
| Value | Count | Frequency (%) |
| 0 | 1895 | |
| 0.01 | 5 | 0.1% |
| 0.02 | 2 | < 0.1% |
| 0.07 | 1 | < 0.1% |
| 0.1 | 59 | 0.6% |
| Value | Count | Frequency (%) |
| 86.36 | 1 | |
| 81.3 | 1 | |
| 80 | 1 | |
| 77.27 | 1 | |
| 73.3 | 1 |
| Distinct | 1619 |
|---|---|
| Distinct (%) | 16.6% |
| Missing | 228 |
| Missing (%) | 2.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.595388073 |
| Minimum | 0 |
|---|---|
| Maximum | 124.46 |
| Zeros | 1282 |
| Zeros (%) | 12.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.06099 |
| median | 0.58 |
| Q3 | 1.37 |
| 95-th percentile | 3.90144 |
| Maximum | 124.46 |
| Range | 124.46 |
| Interquartile range (IQR) | 1.30901 |
Descriptive statistics
| Standard deviation | 6.608683218 |
|---|---|
| Coefficient of variation (CV) | 4.142367195 |
| Kurtosis | 153.6917019 |
| Mean | 1.595388073 |
| Median Absolute Deviation (MAD) | 0.5546 |
| Skewness | 11.62212638 |
| Sum | 15590.13225 |
| Variance | 43.67469388 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1282 | 12.8% |
| 0.01 | 183 | 1.8% |
| 0.1 | 155 | 1.6% |
| 1 | 101 | 1.0% |
| 0.03 | 82 | 0.8% |
| 0.0254 | 73 | 0.7% |
| 1.1 | 70 | 0.7% |
| 1.3 | 69 | 0.7% |
| 1.5 | 68 | 0.7% |
| 1.8 | 67 | 0.7% |
| Other values (1609) | 7622 | |
| (Missing) | 228 | 2.3% |
| Value | Count | Frequency (%) |
| 0 | 1282 | |
| 5.2 × 105 | 1 | < 0.1% |
| 0.0001 | 1 | < 0.1% |
| 0.00025 | 1 | < 0.1% |
| 0.00033 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 124.46 | 1 | < 0.1% |
| 105.83418 | 2 | |
| 100.0125 | 1 | < 0.1% |
| 100 | 3 | |
| 99.90582 | 4 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
| df_index | Unnamed: 0 | product_name | brands | ingredients_text | nutrition_grade_fr | energy_100g | fat_100g | saturated-fat_100g | carbohydrates_100g | sugars_100g | fiber_100g | proteins_100g | salt_100g | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 92370 | 95036 | Premium Ice Cream | Casper's Ice Cream Inc. | Fresh whole milk, sugar, cream, mint flake (corn syrup, sugar, partially hydrogenated soybean oil, red lake 40, green 3, red 40, peppermint oil, soy lecithin), corn syrup solids, non-fat dry milk, stabilizer (microcrystalline cellulose, cellulose gum, mon | d | 908.0 | 10.14 | 5.80 | 28.99 | 20.29 | 0.0 | 2.90 | 0.16510 |
| 1 | 145467 | 150904 | Kohler, Strawberry Balsamic Rare Facets Chocolate | Kohler Original Recipe Chocolates | 61% chocolate (cocoa beans, pure cane sugar, cocoa butter, soy lecithin, vanilla bean), milk chocolate (pure cane sugar, full cream milk, cocoa butter, cocoa beans, soy lecithin, vanilla bean), strawberry puree (strawberry 87%, invert sugar syrup), 55% ch | c | 314.0 | 5.00 | 2.50 | 10.00 | 10.00 | 0.0 | 0.00 | 0.00000 |
| 2 | 100452 | 103365 | Center Cut Chops Boneless Thin Pork | Target Stores | Pork, *solution ingredients: water, potassium lactate, sodium phosphates, salt, diacetate. | NaN | 640.0 | 7.06 | 2.35 | 0.00 | NaN | NaN | 18.82 | 0.71628 |
| 3 | 172539 | 179915 | Corn Flakes Glacés au sucre | Crownfield | Maïs 77% Sucre 28% Extrait de Malte d'orge Sel | c | 1597.0 | 0.30 | 0.10 | 86.10 | 29.60 | 3.2 | 5.70 | 0.73000 |
| 4 | 56595 | 57919 | Homekist, Sandwich Cookies, Lemon Creme | Vista Bakery Inc. | Enriched flour (wheat flour, niacin, reduced iron, thiamine mononitrate, riboflavin, folic acid), sugar, vegetable oil (contains one or more of the following: canola oil, corn oil, palm oil, soybean oil), dextrose, high fructose corn syrup, corn syrup, co | d | 1975.0 | 19.44 | 5.56 | 72.22 | 30.56 | 0.0 | 5.56 | 0.67056 |
| 5 | 172980 | 180553 | Biscino Chocolat Noir | Sondey | NaN | e | 2054.0 | 23.30 | 14.80 | 60.50 | 30.90 | NaN | 6.90 | 0.38000 |
| 6 | 53491 | 54715 | Turkey Franks | Jennie-O, Jennie-O Turkey Store Inc. | Mechanically separated turkey, water, salt, contains 2% or less modified food starch, potassium lactate, potassium acetate, sodium diacetate, seasoning (corn syrup solids, dextrose, sugar, paprika, sodium erythorbate, spice extractives), natural smoke fla | d | 895.0 | 17.86 | 4.46 | 1.79 | 0.00 | 0.0 | 12.50 | 2.90322 |
| 7 | 14038 | 14545 | Country Blend Mixed Vegetables | Big Y | Water, carrots, potatoes, celery, sweet peas, green beans, corn, lima beans, salt, calcium chloride (to maintain firmness), onion flavoring. | b | 134.0 | 0.00 | 0.00 | 6.40 | 2.40 | 0.8 | 0.80 | 0.58928 |
| 8 | 179148 | 190504 | Marc Guiselin | Aldi | NaN | e | 3048.0 | 82.00 | 56.40 | 0.20 | 0.20 | 0.0 | 0.60 | 0.25400 |
| 9 | 53240 | 54460 | Juice Cocktail From Concentrate, Cranberry | Best Yet, S. M. Flickinger Co. Inc. | Filtered water, high fructose corn syrup, cranberry juice concentrate, ascorbic acid (vitamin c). | NaN | 243.0 | 0.00 | NaN | 14.17 | 14.17 | NaN | 0.00 | 0.03810 |
Last rows
| df_index | Unnamed: 0 | product_name | brands | ingredients_text | nutrition_grade_fr | energy_100g | fat_100g | saturated-fat_100g | carbohydrates_100g | sugars_100g | fiber_100g | proteins_100g | salt_100g | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9990 | 226067 | 250928 | Mûre Framboise Cranberry | Les 4 Saisons | Mélange de fruits (Mûre, Framboise et Cranberry), sucre | d | 1016.0 | 0.40 | 0.00 | 60.00 | 59.00 | NaN | 0.80 | 0.00000 |
| 9991 | 8544 | 8767 | Chocolate Peanut Clusters | Sweet Smiles, New England Confectionery Company Inc. | Peanuts, milk chocolate [sugar, cocoa butter, chocolate liquor, milk, soy lecithin, vanillin (artificial flavor)]. | e | 2987.0 | 50.00 | 14.29 | 59.52 | 45.24 | 4.80 | 19.05 | 0.21082 |
| 9992 | 16389 | 16949 | Double"Q", Wild Alaskan Skinless & Boneless Pink Salmon | NaN | Pink salmon, salt. | a | 464.0 | 2.38 | 0.00 | 0.00 | 0.00 | 0.00 | 19.05 | 0.88646 |
| 9993 | 235161 | 262145 | Mogettes cuisinées à base de Mogette de Vendée Label Rouge | Nos régions ont du talent | Eau, Mogette de Vendée Label Rouge sèche trempée (38%), carottes (7,6%), lardons cuits fumés rissolés (poitrine de porc, sel) (3%), sel, fond de porc (arômes naturels, graisse et extrait de porc, eau, sel, sucre), aromates (ail, thym, laurier en poudre). | a | 280.0 | 0.90 | 0.20 | 7.30 | 0.90 | 4.00 | 5.30 | 0.94800 |
| 9994 | 20270 | 20897 | Petite Cut Diced Tomatoes With Zesty Jalapenos | Del Monte | Tomatoes, tomato juice, jalapeno peppers, contains less than 1% of the following: salt, dehydrated onions, distilled vinegar, citric acid, spices, garlic powder, calcium chloride, natural flavor, onion powder | b | 100.0 | 0.00 | 0.00 | 4.76 | 2.38 | 0.80 | 0.79 | 0.74676 |
| 9995 | 287019 | 339514 | Tortas De Maiz Gullón S / Gluten 130GR | MILHO CORN | NaN | a | 109.0 | NaN | 0.10 | NaN | 0.60 | 1.90 | 8.00 | 1.30000 |
| 9996 | 22123 | 22795 | 2% Milk Fat Reduced Fat Milk | Darigold | Reduced fat milk, dha algal oil‡, vitamin a palmitate, vitamin d3. | b | 226.0 | 2.08 | 1.25 | 5.42 | 5.00 | 0.00 | 3.33 | 0.13716 |
| 9997 | 247194 | 278537 | Apéro Tapas Aux Ténébrions Et Aux Grillons Bio & Naturellement Sans Gluten | Jimini s | INGRÉDIENTS ténébrions" entiers déshydratés (ténébrion Molitor ) grillons entier déshydratés oignon , betterave rouge dextrose ail poivre noir thym persil origan basilic ciboulette huile de tournesol | d | 2033.0 | 27.81 | 6.92 | 5.80 | 0.70 | 6.21 | 53.07 | 4.68000 |
| 9998 | 95617 | 98326 | Thin Wheat Baked Snack Crackers | Great Value, Wal-Mart Stores Inc. | Whole grain wheat flour, sugar, soybean oil, salt, cornstarch, malt syrup (from corn and barley), invert sugar, vegetable color (annatto extract and turmeric oleoresin), baking soda. | c | 1732.0 | 10.34 | 0.00 | 75.86 | 17.24 | 10.30 | 6.90 | 1.57734 |
| 9999 | 123798 | 128798 | Poker Wafer, Vanilla Cream | Gastone Lago Elledi | Wheat flour, sugar, vegetable oils (palm, coconut), whey powder, soy lecithin, salt, sodium hydrogen carbonate, vanilla extract. | e | 2230.0 | 26.67 | 20.00 | 66.67 | 43.33 | 0.00 | 3.33 | 0.38100 |